Memory management for data streams subject to concept drift

نویسندگان

  • Pierre-Xavier Loeffel
  • Christophe Marsala
  • Marcin Detyniecki
چکیده

Learning on data streams subject to concept drifts is a challenging task. A successful algorithm must keep memory consumption constant regardless of the amount of data processed, and at the same time, retain good adaptation and prediction capabilities by effectively selecting which observations should be stored into memory. We claim that, instead of using a temporal window to discard observations with a time stamp criterion, it is better to retain observations that minimize the change in outputted prediction and rule learned with the full memory case. Experimental results for the Droplets algorithm, on 6 artificial and semi-artificial datasets reproducing various types of drifts back this claim.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

CPF: Concept Profiling Framework for Recurring Drifts in Data Streams

We propose the Concept Profiling Framework (CPF), a metalearner that uses a concept drift detector and a collection of classification models to perform effective classification on data streams with recurrent concept drifts, through relating models by similarity of their classifying behaviour. We introduce a memory-efficient version of our framework and show that it can operate faster and with l...

متن کامل

Learning Flexible Concepts from Streams of Examples: FLORA 2

FLORA2 is a program for supervised learning of concepts that are subject to concept drift. The learning process is incremental in that the examples are processed one by one. A special feature of our program consists in keeping in memory a subset of examples { a window. In time, new examples are being added to the window while other ones are considered outdated and are forgotten. In order to tra...

متن کامل

Self-Adjusting Memory: How to Deal with Diverse Drift Types

Data Mining in non-stationary data streams is particularly relevant in the context of the Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the Self-Adjusting Memory (SAM) ...

متن کامل

A novel concept drift detection method in data streams using ensemble classifiers

Concept drift, change in the underlying distribution that data points come from, is an inevitable phenomenon in data streams. Due to increase in the number of data streams’ applications such as network intrusion detection, weather forecasting, and detection of unconventional behavior in financial transactions; numerous researches have recently been conducted in the area of concept drift detecti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016